Picture for Naoaki Okazaki

Naoaki Okazaki

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

Add code
May 05, 2025
Viaarxiv icon

Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models

Add code
Mar 31, 2025
Viaarxiv icon

Intent-Aware Self-Correction for Mitigating Social Biases in Large Language Models

Add code
Mar 08, 2025
Viaarxiv icon

LCTG Bench: LLM Controlled Text Generation Benchmark

Add code
Jan 27, 2025
Figure 1 for LCTG Bench: LLM Controlled Text Generation Benchmark
Figure 2 for LCTG Bench: LLM Controlled Text Generation Benchmark
Figure 3 for LCTG Bench: LLM Controlled Text Generation Benchmark
Figure 4 for LCTG Bench: LLM Controlled Text Generation Benchmark
Viaarxiv icon

HarmonicEval: Multi-modal, Multi-task, Multi-criteria Automatic Evaluation Using a Vision Language Model

Add code
Dec 19, 2024
Figure 1 for HarmonicEval: Multi-modal, Multi-task, Multi-criteria Automatic Evaluation Using a Vision Language Model
Figure 2 for HarmonicEval: Multi-modal, Multi-task, Multi-criteria Automatic Evaluation Using a Vision Language Model
Figure 3 for HarmonicEval: Multi-modal, Multi-task, Multi-criteria Automatic Evaluation Using a Vision Language Model
Figure 4 for HarmonicEval: Multi-modal, Multi-task, Multi-criteria Automatic Evaluation Using a Vision Language Model
Viaarxiv icon

Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs

Add code
Dec 19, 2024
Figure 1 for Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Figure 2 for Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Figure 3 for Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Figure 4 for Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Viaarxiv icon

Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model

Add code
Oct 30, 2024
Figure 1 for Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model
Figure 2 for Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model
Figure 3 for Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model
Figure 4 for Constructing Multimodal Datasets from Scratch for Rapid Development of a Japanese Visual Language Model
Viaarxiv icon

Tokenization as Finite-State Transduction

Add code
Oct 21, 2024
Figure 1 for Tokenization as Finite-State Transduction
Figure 2 for Tokenization as Finite-State Transduction
Figure 3 for Tokenization as Finite-State Transduction
Figure 4 for Tokenization as Finite-State Transduction
Viaarxiv icon

Distributional Properties of Subword Regularization

Add code
Aug 21, 2024
Viaarxiv icon

HMoE: Heterogeneous Mixture of Experts for Language Modeling

Add code
Aug 20, 2024
Figure 1 for HMoE: Heterogeneous Mixture of Experts for Language Modeling
Figure 2 for HMoE: Heterogeneous Mixture of Experts for Language Modeling
Figure 3 for HMoE: Heterogeneous Mixture of Experts for Language Modeling
Figure 4 for HMoE: Heterogeneous Mixture of Experts for Language Modeling
Viaarxiv icon